798 research outputs found

    Model and Evaluation: Towards Fairness in Multilingual Text Classification

    Full text link
    Recently, more and more research has focused on addressing bias in text classification models. However, existing research mainly focuses on the fairness of monolingual text classification models, and research on fairness for multilingual text classification is still very limited. In this paper, we focus on the task of multilingual text classification and propose a debiasing framework for multilingual text classification based on contrastive learning. Our proposed method does not rely on any external language resources and can be extended to any other languages. The model contains four modules: multilingual text representation module, language fusion module, text debiasing module, and text classification module. The multilingual text representation module uses a multilingual pre-trained language model to represent the text, the language fusion module makes the semantic spaces of different languages tend to be consistent through contrastive learning, and the text debiasing module uses contrastive learning to make the model unable to identify sensitive attributes' information. The text classification module completes the basic tasks of multilingual text classification. In addition, the existing research on the fairness of multilingual text classification is relatively simple in the evaluation mode. The evaluation method of fairness is the same as the monolingual equality difference evaluation method, that is, the evaluation is performed on a single language. We propose a multi-dimensional fairness evaluation framework for multilingual text classification, which evaluates the model's monolingual equality difference, multilingual equality difference, multilingual equality performance difference, and destructiveness of the fairness strategy. We hope that our work can provide a more general debiasing method and a more comprehensive evaluation framework for multilingual text fairness tasks

    An Effective Deployment of Contrastive Learning in Multi-label Text Classification

    Full text link
    The effectiveness of contrastive learning technology in natural language processing tasks is yet to be explored and analyzed. How to construct positive and negative samples correctly and reasonably is the core challenge of contrastive learning. It is even harder to discover contrastive objects in multi-label text classification tasks. There are very few contrastive losses proposed previously. In this paper, we investigate the problem from a different angle by proposing five novel contrastive losses for multi-label text classification tasks. These are Strict Contrastive Loss (SCL), Intra-label Contrastive Loss (ICL), Jaccard Similarity Contrastive Loss (JSCL), Jaccard Similarity Probability Contrastive Loss (JSPCL), and Stepwise Label Contrastive Loss (SLCL). We explore the effectiveness of contrastive learning for multi-label text classification tasks by the employment of these novel losses and provide a set of baseline models for deploying contrastive learning techniques on specific tasks. We further perform an interpretable analysis of our approach to show how different components of contrastive learning losses play their roles. The experimental results show that our proposed contrastive losses can bring improvement to multi-label text classification tasks. Our work also explores how contrastive learning should be adapted for multi-label text classification tasks.Comment: Accepted by ACL-Findings 2023, 13 page

    CL-XABSA: Contrastive Learning for Cross-lingual Aspect-based Sentiment Analysis

    Full text link
    As an extensive research in the field of Natural language processing (NLP), aspect-based sentiment analysis (ABSA) is the task of predicting the sentiment expressed in a text relative to the corresponding aspect. Unfortunately, most languages lack of sufficient annotation resources, thus more and more recent researchers focus on cross-lingual aspect-based sentiment analysis (XABSA). However, most recent researches only concentrate on cross-lingual data alignment instead of model alignment. To this end, we propose a novel framework, CL-XABSA: Contrastive Learning for Cross-lingual Aspect-Based Sentiment Analysis. Specifically, we design two contrastive strategies, token level contrastive learning of token embeddings (TL-CTE) and sentiment level contrastive learning of token embeddings (SL-CTE), to regularize the semantic space of source and target language to be more uniform. Since our framework can receive datasets in multiple languages during training, our framework can be adapted not only for XABSA task, but also for multilingual aspect-based sentiment analysis (MABSA). To further improve the performance of our model, we perform knowledge distillation technology leveraging data from unlabeled target language. In the distillation XABSA task, we further explore the comparative effectiveness of different data (source dataset, translated dataset, and code-switched dataset). The results demonstrate that the proposed method has a certain improvement in the three tasks of XABSA, distillation XABSA and MABSA. For reproducibility, our code for this paper is available at https://github.com/GKLMIP/CL-XABSA

    An interpretability framework for Similar case matching

    Full text link
    Similar Case Matching (SCM) plays a pivotal role in the legal system by facilitating the efficient identification of similar cases for legal professionals. While previous research has primarily concentrated on enhancing the performance of SCM models, the aspect of interpretability has been neglected. To bridge the gap, this study proposes an integrated pipeline framework for interpretable SCM. The framework comprises four modules: judicial feature sentence identification, case matching, feature sentence alignment, and conflict resolution. In contrast to current SCM methods, our framework first extracts feature sentences within a legal case that contain essential information. Then it conducts case matching based on these extracted features. Subsequently, our framework aligns the corresponding sentences in two legal cases to provide evidence of similarity. In instances where the results of case matching and feature sentence alignment exhibit conflicts, the conflict resolution module resolves these inconsistencies. The experimental results show the effectiveness of our proposed framework, establishing a new benchmark for interpretable SCM

    Common and different features of Chinese and Italian hydrogeological mapping guidelines

    Get PDF
    The definition of common international guidelines for the compilation of high quality hydrogeological maps has been attempted from the second half of the last century for hydrogeologists, to solve the lack of uniformity among national guidelines due to the various geological-hydrogeological and climatic situations of different countries worldwide. With this aim, the China Geological Survey and the Geological Survey of Italy-ISPRA are undertaking cooperative research in implementing 1:50,000 scale hydrogeological survey and mapping at selected sites in both countries. The project intends to develop a new generation of hydrogeological and groundwater resource maps with descriptive effectiveness and consistency with field survey data. The project will promote improvements of technologies in hydrogeological survey and mapping of the two countries and might even be agreed at a wider international level. Chinese and Italian hydrogeological guidelines have similar aspects as well as concerns: 1) the undertaking of field surveys at the 1:50,000 scale and more detailed (1:25000) scale; 2) building of a hydrogeological database; 3) publication of the official map in both paper and electronic form; 4) inclusion of several small scale maps inlayed at the margin of a main map in the hydrogeological map layout; 5) comparable level in required survey quota. Furthermore, more attention will be paid to a 3D map, conceptual model, aquifer structure, groundwater cycle and hydrogeological parameter description.In contrast, the most important difference regards the following. The hydrogeological mapping guidelines of Italy have integrated specifications for both survey and mapping, i.e. they deal with a structural layout characterized by survey contents followed by mapping contents and reflect a technical route of surveying for mapping. In contrast, there are no mapping contents in the current hydrogeological guidelines of China and these then needed to be formulated. The Italian guidelines could provide important references for China in legend organization, mapping rules, survey quota and so on.Finally, the collaboration between China and Italy is of great significance for the two ancient civilized countries sharing the “One Belt and One Road” international initiative. </p

    Exploring Post-Training Quantization of Protein Language Models

    Full text link
    Recent advancements in unsupervised protein language models (ProteinLMs), like ESM-1b and ESM-2, have shown promise in different protein prediction tasks. However, these models face challenges due to their high computational demands, significant memory needs, and latency, restricting their usage on devices with limited resources. To tackle this, we explore post-training quantization (PTQ) for ProteinLMs, focusing on ESMFold, a simplified version of AlphaFold based on ESM-2 ProteinLM. Our study is the first attempt to quantize all weights and activations of ProteinLMs. We observed that the typical uniform quantization method performs poorly on ESMFold, causing a significant drop in TM-Score when using 8-bit quantization. We conducted extensive quantization experiments, uncovering unique challenges associated with ESMFold, particularly highly asymmetric activation ranges before Layer Normalization, making representation difficult using low-bit fixed-point formats. To address these challenges, we propose a new PTQ method for ProteinLMs, utilizing piecewise linear quantization for asymmetric activation values to ensure accurate approximation. We demonstrated the effectiveness of our method in protein structure prediction tasks, demonstrating that ESMFold can be accurately quantized to low-bit widths without compromising accuracy. Additionally, we applied our method to the contact prediction task, showcasing its versatility. In summary, our study introduces an innovative PTQ method for ProteinLMs, addressing specific quantization challenges and potentially leading to the development of more efficient ProteinLMs with significant implications for various protein-related applications.Comment: 8 pages, 4 figure

    The Greenhouse Gas Emission from Portland Cement Concrete Pavement Construction in China.

    Get PDF
    This study proposes an inventory analysis method to evaluate the greenhouse gas (GHG) emissions from Portland cement concrete pavement construction, based on a case project in the west of China. The concrete pavement construction process was divided into three phases, namely raw material production, concrete manufacture and pavement onsite construction. The GHG emissions of the three phases are analyzed by a life cycle inventory method. The COâ‚‚e is used to indicate the GHG emissions. The results show that for 1 km Portland cement concrete pavement construction, the total COâ‚‚e is 8215.31 tons. Based on the evaluation results, the COâ‚‚e of the raw material production phase is 7617.27 tons, accounting for 92.7% of the total GHG emissions; the COâ‚‚e of the concrete manufacture phase is 598,033.10 kg, accounting for 7.2% of the total GHG emissions. Lastly, the COâ‚‚e of the pavement onsite construction phase is 8396.59 kg, accounting for only 0.1% of the total GHG emissions. The main greenhouse gas is COâ‚‚ in each phase, which accounts for more than 98% of total emissions. Nâ‚‚O and CHâ‚„ emissions are relatively insignificant

    Cancer Nanotechnology: Enhancing Tumor Cell Response to Chemotherapy for Hepatocellular Carcinoma Therapy

    Get PDF
    Abstract Hepatocellular carcinoma (HCC) is one of the deadliest cancers due to its complexities, reoccurrence after surgical resection, metastasis and heterogeneity. In addition to sorafenib and lenvatinib for the treatment of HCC approved by FDA, various strategies including transarterial chemoembolization, radiotherapy, locoregional therapy and chemotherapy have been investigated in clinics. Recently, cancer nanotechnology has got great attention for the treatment of various cancers including HCC. Both passive and active targetings are progressing at a steady rate. Herein, we describe the lessons learned from pathogenesis of HCC and the understanding of targeted and non-targeted nanoparticles used for the delivery of small molecules, monoclonal antibodies, miRNAs and peptides. Exploring current efficacy is to enhance tumor cell response of chemotherapy. It highlights the opportunities and challenges faced by nanotechnologies in contemporary hepatocellular carcinoma therapy, where personalized medicine is increasingly becoming the mainstay. Overall objective of this review is to enhance our understanding in the design and development of nanotechnology for treatment of HCC

    Investigation of genetic diversity and population structure of common wheat cultivars in northern China using DArT markers

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>In order to help establish heterotic groups of Chinese northern wheat cultivars (lines), Diversity arrays technology (DArT) markers were used to investigate the genetic diversity and population structure of Chinese common wheat (<it>Triticum aestivum </it>L.).</p> <p>Results</p> <p>In total, 1637 of 7000 DArT markers were polymorphic and scored with high confidence among a collection of 111 lines composed mostly of cultivars and breeding lines from northern China. The polymorphism information content (PIC) of DArT markers ranged from 0.03 to 0.50, with an average of 0.40, with P > 80 (reliable markers). With principal-coordinates analysis (PCoA) of DArT data either from the whole genome or from the B-genome alone, all lines fell into one of two major groups reflecting 1RS/1BL type (1RS/1BL and non-1RS/1BL). Evidence of geographic clustering of genotypes was also observed using DArT markers from the A genome. Cluster analysis based on the unweighted pair-group method with algorithmic mean suggested the existence of two subgroups within the non-1RS/1BL group and four subgroups within the 1RS/1BL group. Furthermore, analysis of molecular variance (AMOVA) revealed highly significant (<it>P </it>< 0.001) genetic variance within and among subgroups and among groups.</p> <p>Conclusion</p> <p>These results provide valuable information for selecting crossing parents and establishing heterotic groups in the Chinese wheat-breeding program.</p
    • …
    corecore